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1. TssedacflQIl n04 daims have been cancelled, 

Inrespon^OfSceAc^^ 
^doradded. C to l-18^in^ apP ^ D - ^c — 
the application, as amended, is requested. 

n . §pedfi«LBoiLQbi^eQ^ Disclose was objected to a, 

!n paragraph (2) of the Office Actxon, the Abstract 

exceeding 150 words in length. Disclosure to overcome 

Applicants' attorney submits herewith a placement Ab. 

this objection. 

^ ^ owll^ patenting - 1— * ~ 

created doctrine of obviousness typ 

claims 1-5 of U-S. Patent No. 6,175,957 Bl. ^ 37 C .F.R. $1,321 (3 to 
Applicants' attorney subnuts herewith a Terminal Dis 

overcome these rejections. 

IV. T > "? r Art R ejections 

A. ■Xk^QS^J^M^^ , 4710 13 md 16 ^rejected 

U pa^ CW9 * *■ °*« ^ <^ ^ ^ Code Pos «. ACM 

19S 0 ° f T °tl til. A k «. « «. ^ 1MS 

(Torico). Howevamp^g-P^CTU ^^edaim, but would be allowable if tewriomm 

rrrrrrr--rr^.— > 

Wmd to overcome the double-patenting rejection. 
Disclauner was submitted » ^ of ^ but xespect^ly 

Applicants' attorney acknowledges toe m 

traverse these rejections. 
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B Applicants' Tnriependent aaims 

Independent claims 1, 7, and 13 arc generally directed to a method of restructuring a 
program comprising basic blocks for execution by a processor having a memory hierarchy 
comprised of a plurality of levels. A Program Execution Graph (PEG) corresponding to a first level 
of the memory hierarchy is constructed from control flow and frequency information from a profik 
of the program. The PEG comprises a weighted undirected graph comprising nodes and edges, 
where* each node represents a basic block, each edge reprints a transfer of control between a 
pair of the basic blocks, each of the nodes has a weight equal to the size of *e basxc block 
represented by the node, and each of the edges has a weight equal to a frequency of transmon 
between a pair of basic blocks represented by a pair of nodes connected by the edge. For the first 
level of the memory hierarchy, the nodes of the PEG are partitioned into clusters, such that a sum 
of weights of the edges whose endpoints are in different clusters 1S mmiWd, and such that, for any 
cluster, a sum of wights of the nodes in the cluster is no greater than an upper bound 
corresponding to a size of the first level of the memory hierarchy. The basic block, are restructured 
into contiguous code corresponding to the clusters, such that the basic blocks that communicate 
extensively with each other are on a same level of the memory hierarchy, in order to reduce 
communications between me basic blocks across the levels of the memory hierarchy. 

Q Thf Pettis Reference 

Pettis describes profile-guided code portioning wherein execution profile data is used as 
input into the compilation process. One prototype positions code based on whole procedures, 
wherein k has the ability to move procedures into an order that is determined by a "closest * best" 
strategy. Another prototype positions code based on basic blocks within procedures, wherein baac 
blocks that would be better as straight line sequences are identified as chains and these chains are 
then ordered according to branch heuristics. 

D. The Tomkn Reference 

Tomko describes profile driven weighted decomposition, wherein application and machine 
specific information are used in conjunction with domain decomposition to achieve a level of 
performance not possible with traditional domain decomposition methods. AppUcation profiling 
characterizes the performance of an appUcation on a specific machine. A method is presented that 



" 9 " G&C 30571.249-US-C1 

13^0 • RCVD AT 6/9/2004 7:10:40 PM lEasteni Daylight Timel * SVR:USPT0-EFXRF-1/1 « DN1S:8729306 * CS1D:+1310W18798 » DURATION ( m m-ss):05-50 



06-09-2004 03:22PM F ROM-Gates 4 Coopsr LLP 



+13106418798 



T-660 P. 014/020 F-782 



^gofappWonprofiled^o ..dedge weigh* for-** 

ighted graph decomposition algorithms. 



uses curve 
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* nonoLus — - No-*. *e OB- Ac*on states ^ « 

. m ^o. for -—r St :^ c tsrt 2; 

-nst^ggraphisp^^^^^ 

of a procedure (basic block). ^?^*£J!V* (PEG) corresponding to a first 

/> COOSMC ^Z^TSht cSuU lines 27-32) from control flow 
level of the memory bexatchy (1 age 10 *g ^ ^ PEG comprising a 
and frequency informanon from a profile or. toe P"8F™> 

weighted unacted ^ ^-^S^C*^^*^* 

each aode K P E % e ^XcL age 17, right colum »> see Secti ° n ^ 
^^f^j^t^^STEd tb^Sges correspond to calls 
•each node of4*8W£ SiES«Sdarf - ^o referred as a basic block - page 21. 

a pair of basic blocks (See page 17, 

Jcalls between procedure*) "^^^jJ^^SS^^e. of the 
b) for the fim level of the -f^^^'^SS endpoints are in 
PEG into clusters, such *« J sum of we ^ f ic ferrmg 
different dusters ismmin^ f^^^^^JJSL^^ 
to the partition: A, E-N-B-C-D-F-H. 1 j , ^ ^ V & Qodes 

of Figure 4 in pag* 20), and such that ^ of the ft* 

in the cluster is no greater than an upper bound corresponding 
level of the memory hierarchy _ ^jgm code corresponding to the 
1 ^J^SS^Sm^SS. extensively with ~ch other are 
clusters, such that the basic ow** reduce communicanons 

«, rajrumize fl u: 0> ^^^ffigg ^ ^ vertex weights (aode weight.) 

Tomko discloses « graph that is mciuoea* . J. , 65 ^ 

and edge weights for use with graph parODomng (Re Toroko. page loa, og 
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ccW, Stst paraph). U *d— . K^titlr^ 
mto of ^joint ^"XtlZ^^ of 4, cut 

subset with to highest sum .s dose to ^A. | ' d 3 Domain 

Decomposition, dutdpmg^pK ^ of oldimIy skill in the art 

Wote, itwould have been °^™"^ f j, cttis: phoning 

« (he ume of inrention was made to comb*^ to teacfimg^ o of Tom ko: 

PMddonmg ^^"^f^- confotm to the si*e limitation of cett»n 
for profiling. 

« disagrees At the indicated locations, the combination of Pettis and 
Applicants attorney disagrees. At »«- ^ 

i.i A rncri ir>rr> clusters, such tnat a sum 01 w^g" = 
°^ EG ^^ ^ tforanyclust er > a S u m ofwei S ht S ofd 1 enode S1 nth e au S ter 1 s 

clusters is minimized, and such .** f ^ 
no greater than an upper bound corresponding to a snc of the 
at pa^ 21, kfteoluinn fourth paragraph. 

However, the cited location in Petds merely describes the follow**: 

to^U^^^^ eWeen c ws, we arrange them 
After we determine the P^^^. C ^ ssible . If *ere is some freedom W 
so thar the order « « " ^^^S^L^ » citing chains by 

t°hSr c :r^ 

determine is: _ 

main test for to loop (bask block B) is now .n the middle of the loop. 

-This potion of Petns me^ describes choosing chains connected to odsring chain, by to 
„ • iLcvetnowhetedoesi.descnbeparidoningtonod^oftoJEG.ntoclus.et,, 



claimed invention. 



PAGE 15120* RCVD AT 6(9120047 



G&C30571.249-US-C1 



06-09-2004 03:Z3PM FROM-Gatas & Cooper LLP 



+13106418798 



T-660 P. 01 6/020 F-782 



The Office Action also cites Pettis as teaching the Stations of Applicants' original 

independent dai^s^ . 
„ndin g „ the dusters, such that the basic blocks that tunicate 

.Jlonasarnclevelofther^ 

basic blocks across the levels of the memory hierarchy," at page 21, secUon 4.3. 
However, the dted locations in Pettis merely describe the following: 

TWU- paire ?1 , se.etion 4.3 

otnei oy mean* r ., nrfies to msuie th at the code executes correcdy. These 
TTSIZ'SZd th^efXthrough after a basic block is no longer correct. 

uncoodino-M touch.. We could take to trouble » reverse to sou c £*f 
e^tional branch and altet to rathe, than ,»st ™^*«Z%ZZ 
to to bptimizer already know, how to do Ob so we to rt do 

At this point, to procedure i. ready to be passed through to opm=«r. 

These portions of Pettis merely describe restructuring to basic block graph aid 
reconnecting basic blocks, inducing to inserting of any necessary branches to ensure tot to code 
executes correcdy. However, nowhere does it describe resets to basic blocks into coneys 
code corresponding to to clusters, such tot to basic blocks tor communicate extensively wrth 
eachotherareo».aan*leveloftomemoryHc^y>^ 

hetween to basic Bocks across to levels of to memory hierarchy, as recited in Applicants ctaned 



invention. 



Froatty to OrWedon admits tha.Pems does not teach "node weights" butassem to. 
Pems does teach basic block splitting for minimizing to size of the basic block at page 21. nght 
column, section 5. second paragraph, and that Tomko discloses a graph tot is induced with verte* 
weights (node weights) and edge weigh, for use wid. graph partitioning, and that Tomkc closes a 

_1 2 " G&C 30571 .249-US-C1 



06-09^2004 03:23PM FROM-Gatss & Cooper LLP 



+13106418798 



T-660 P. 01 7/020 F-782 



method for partitioning the vertices (nodes) into the number of disjoint subset so that the sum of 
the vertex weights for me subset with the highest sum is close to the average sum, and the total cost 
of the cut edge is minimized, at page 166, left column, section 3, third paragraph. According to the 
Office Action, it would have been obvious to a person of ordinary skill in the art at the rime of 
invention was made to combine the teaching of Pettis, e.g„ partitioning weighted graph where the 
weights are in the edge S , with the teaching of Tomko, e.g., partitioning the weighted graph where the 
weights are with the vertices and edges, because doing so would yield balance loading and conform 
to the size limitation of certain memory/renter resources, thus would improve me performance of 
compilations for profiling. 

The cited locations in Pettis and Tomko are set forth below. 

Pr ~j„. rr 71 ri pht colur" ", gprrion 5. second paragaah 
Procedure Splitting is the process of separating the fluff basic blocks of a 
procedure into a separate region in an attempt to minimize the size : of 
procedure. The benefits of procedure splitting magnify the concept of locality^ 
producing smaller and denser primary procedures, more procedures ^w be 
packed onto a single page. This should result in a. further reduction of the pag*. 
working set size and the number of page and TLB misses: 

Tr>mkn: page n«, "frht column first parapraph 

Applications wim irregular domains are a challenge to parallels efficiendy 
and heuristic algorithms are used to partition them for parallel execuUo^Many such 
"phc wTarfeVen more challenging due to the heterogeneity of their dom*ns: 
phoning the application into equal size sub-domains does not 
balance because the work load per vertex vanes across the application domam 
Fortunately, several domain decomposition algorithms are W-able which allow 
weighted graphs as input. These algorithms can potentially deal with such 
Soseaeous apphcarions. However, these domain decomposition algorithms are 
oSy h£ of the folution; without an appropriately weighted m^ut graph they can 
not do an adequate job. We present a method that uses curve-fitting of apphcation 
profile data to calculate vertc* and edge weights for use with weighted graph 
Partitioning algorithms. Our method is the first of which we are aware to solve the 
Problem of graph weight calculation. We demonstrate its potential on some 
£^JS*s W a production finite element application Our method 
£3£2 ^weighted multilevel algorithm reduced load imbalance from ,52% to 
less than 10% on the more imbalanced subroutine of the two in our case study. 

TWnkn: na^e 1 M 1e.fr cohim r <">™"" 3. third paragraph 
The weighted graph partitioning problem can thus be stated as follows. 
Given a graph with vertex and edge weights, partition the vertices intc, "™ "**« : 
of disjoin subsets such that sum of the vertex weights for the subset with the 
highest sum is dose to the average sum, and the of the cut edges is minimized. 
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These portions of Pettis merely describe procedure splitting, while Tomko merely describes 
a method that uses curve-fitting of application profile data to calculate vertex and edge weights for 
use with weighted graph partitioning algorithms and the weighted graph partitioning itself. 
However, nowhere do they describe: (1) pamtiorong the nodes of the PEG into clusters, such that a 
5um heights of the edges whose endpoints are in different duster, is rrdnimized, and such that 
for any duster, a sum of weights of the nodes in the duster is no greater than an upper bound 
corresponding to a size of the first levd of the memory hierarchy, as recited in Applicants" cWed 
invention; or (2) restructuring the basic blocks into contiguous code corresponding to the clusters, 
such that the basic blocks that communicate extensivdy with each other are on a same level of the 
memory hierarchy, in order to reduce communications between the basic blocks across the levels of 
the memory hierarchy, as redted in Applicants' claimed invention. Moreover, these limitations are 
not be obvious in view of the combination of Pettis and Tomko. 

Thus, Applicants' attorney submit, that independent claims 1 , 7 and 13 are allowable over 
Pettis and Tomko. Further-, dependent claims 4, 7, 10 and 16 are Emitted to be allowable over 
Pettis and Tomko in the same manner, because they are dependent on independent daims 1, 7 and 
13, respectivdy, and thus contain all the limitations of the independent claims. In addition, 
dependent claims 4, 7, 10 and 16 recite additional novd dements not shown by Pettis and Tomko. 

V, Conclusion. 

In view of the above, it is submitted that this application is now in good order for allowance 
and such allowance is respectfully solidted. 
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Should the Examiner believe minor matters still remain that can be resolved in a telephone 
interview, the Examiner is urged to call Applicants' undersigned attorney. 

Respectfully submitted, 

GATES & COOPER LLP 
Attorneys for Applicants 

Howard Hughes Center 
6701 Center Drive West, Suite 1050 
Los Angeles, California. 90045 
(310) 641-8797 




Date: Tima 9- 2004 D T / /) ' ' / '/ 

J Nanfc: g^orge H. Gates 

Reg. No.: 33,500 

GHG/ 
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